Statistical Analyses for measuring the impact of COVID-19 Vaccine Misinformation on COVID-19 Vaccine Acceptance

This notebook covers all statistical models that generate results and figures in the main paper and supplementary. Transformed survey data are used.

We use pystan package, which is the Python implementation of the Stan platform for statistical modeling. This package allows us to perform full Bayesian inference using Hamiltonian Monte Carlo via No-U-Turn Sampler (NUTS).

For brevity, the analysis in this notebook looks at the UK, and can be easily repeated for the US. We begin by importing the processed data for the UK (see import_data.ipynb for more details). The src/utils.py file provides many helper functions which we will frequently use to generate tables and figures for the paper. All models in this notebook can be found in src/models.py but have been written out here to aid model description.

In [36]:
%matplotlib inline
import src.utils as ut
NUM_SAMPLES = 200
#here we have set samples to just 200 (x 4 chains), but in the paper we use 2000 samples (x 4 chains) for more credible estimates
In [29]:
df, dd = ut.import_transformed_data('dat/orb_uk')

Impact of Exposure to Misinformation

We begin by modeling the impact of exposure of (mis)information on COVID-19 vaccine acceptance. As noted in the paper, vaccine acceptance is a 4-class ordinal variable going from "Yes, definitely" to "No, definitely not". See the "Statistical Methods" section of the paper; specific model specs will be referred to in the code below.

Below is a function that essentially defines the corresponding Stan model, and fits it with the given data.

In [31]:
def model_impact(df, group=1., kind='self', prior_mu=1., prior_sigma=1., iters=NUM_SAMPLES):
    # Model: Ref 1, Table 3
    # Results: Table 2
    import pystan as st
    model_code = '''
                    data {
                        int<lower=0> n; //number of data points
                        int<lower=1> m; //number of conditions
                        int<lower=2> k; //number of outcomes
                        int<lower=1,upper=k> y[n,m]; //outcome per sample per condition
                    }
                    parameters {
                        real mu[k-1];
                        real<lower=0> sigma[k-1];
                        ordered [k-1] alpha[m];
                    }
                    model {
                        for (i in 1:(k-1)) {
                            mu[i] ~ normal(0, %f);
                            sigma[i] ~ exponential(%f);
                        }
                        for (i in 1:m)
                            alpha[i] ~ normal(mu, sigma);
                        for (i in 1:n)
                            for (j in 1:m)
                                y[i,j] ~ ordered_logistic(0, alpha[j]);
                    }
                '''%(prior_mu, prior_sigma)
    df = df.loc[df['Treatment']==group]
    data = {'n':df.shape[0], 'm':2, 'k':4, 'y':df[['Vaccine Intent for %s (Pre)'%kind, 'Vaccine Intent for %s (Post)'%kind]].values}
    model = st.StanModel(model_code=model_code)
    fit = model.sampling(data=data, iter=iters)
    return fit

Let's call the above function to generate the fit object, when modeling impact of exposure to misinformation (treatment group).

In [32]:
fit_impact_T = model_impact(df, group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_051c2df6945ffd129449f7712dd290c3 NOW.

This object's plot() and print() methods allow us to see a quick summary of the model parameters. In particular, we are looking for well-mixed chains in the trace plots, and the Rhat statistic to be close to 1.

In [33]:
fit_impact_T.plot()
print(fit_impact_T)
WARNING:pystan:Deprecation warning. PyStan plotting deprecated, use ArviZ library (Python 3.5+). `pip install arviz`; `arviz.plot_trace(fit)`)
Inference for Stan model: anon_model_051c2df6945ffd129449f7712dd290c3.
4 chains, each with iter=200; warmup=100; thin=1; 
post-warmup draws per chain=100, total post-warmup draws=400.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
mu[1]        0.04    0.02   0.35  -0.65  -0.14   0.04   0.23   0.87    261   1.01
mu[2]         1.2    0.05   0.56  -0.33   0.98   1.33   1.58   2.02    149   0.99
mu[3]        1.81     0.1   0.85  -0.52   1.46   2.08   2.39   2.88     76   1.02
sigma[1]     0.51    0.05   0.46    0.1   0.24   0.39   0.64   1.55    101   1.07
sigma[2]     0.75    0.06   0.59   0.17   0.32   0.56   0.97   2.33    105    1.0
sigma[3]     0.97    0.09   0.88   0.13   0.35   0.71   1.36   2.92    102   1.02
alpha[1,1]   0.16  2.0e-3   0.04   0.09   0.14   0.16   0.19   0.24    360    1.0
alpha[2,1]  -0.09  1.5e-3   0.03  -0.15  -0.12   -0.1  -0.07  -0.04    426   0.99
alpha[1,2]   1.79  2.5e-3   0.06   1.68   1.75   1.79   1.83    1.9    480    1.0
alpha[2,2]   1.31  2.6e-3   0.04   1.22   1.29   1.32   1.35    1.4    289   1.01
alpha[1,3]   2.67  3.4e-3   0.07   2.53   2.62   2.67   2.72   2.82    462    1.0
alpha[2,3]   2.26  3.2e-3   0.06   2.14   2.22   2.27    2.3   2.38    361    1.0
lp__        -6799     0.2   2.44  -6805  -6800  -6799  -6797  -6795    152   1.02

Samples were drawn using NUTS at Wed Oct 21 19:01:17 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

A clear distinction of the $\alpha$ corresponding to pre- and post-exposure is visible in the third plot, which would indicate differences after misinformation exposure. We can use some helper functions to generate the posterior statistics---parameter mean to judge effect size and 95% percentile intervals (PI) to judge "significance"---of the distribution across the 4 vaccine acceptance categories from this fit object. This contributes to Table 2 of the paper.

In [50]:
fit2stats_impact_T = ut.stats_impact(fit_impact_T)
print(fit2stats_impact_T)
                                      mean      2.5%     97.5%
Pre Exposure  Yes, definitely     0.540643  0.522187  0.560689
              Unsure, lean yes    0.315931  0.297623  0.332839
              Unsure, lean no     0.078514  0.068266  0.088301
              No, definitely not  0.064912  0.056396  0.073359
Post Exposure Yes, definitely     0.476482  0.462095  0.490461
              Unsure, lean yes    0.311729  0.297121  0.326887
              Unsure, lean no     0.117550  0.106898  0.129421
              No, definitely not  0.094240  0.084757  0.104831
Post-Pre      Yes, definitely    -0.064161 -0.086019 -0.043465
              Unsure, lean yes   -0.004203 -0.025401  0.016586
              Unsure, lean no     0.039036  0.023926  0.053986
              No, definitely not  0.029328  0.015529  0.042707

Looking at the difference of post and pre exposures, we see a significant (PI excludes 0) drop in people who would "definitely" accept the vaccine, and a significant rise in those who are unsure but leaning towards no, or definitely will not accept the vaccine. Similarly, we can repeat this analysis for exposure to factual information (control group).

In [51]:
fit_impact_C = model_impact(df, group=0)
fit2stats_impact_C = ut.stats_impact(fit_impact_C)
print(fit2stats_impact_C)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_051c2df6945ffd129449f7712dd290c3 NOW.
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:41 of 400 iterations ended with a divergence (10.2 %).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
                                      mean      2.5%     97.5%
Pre Exposure  Yes, definitely     0.540566  0.515304  0.569765
              Unsure, lean yes    0.326975  0.300193  0.354038
              Unsure, lean no     0.085272  0.070760  0.102059
              No, definitely not  0.047186  0.035373  0.061836
Post Exposure Yes, definitely     0.546568  0.518428  0.572214
              Unsure, lean yes    0.319365  0.293458  0.344823
              Unsure, lean no     0.085369  0.068923  0.101580
              No, definitely not  0.048699  0.037624  0.060944
Post-Pre      Yes, definitely     0.006001 -0.029628  0.043697
              Unsure, lean yes   -0.007611 -0.041781  0.024433
              Unsure, lean no     0.000097 -0.022323  0.020930
              No, definitely not  0.001513 -0.014967  0.018260

Here, we do not see a significant change in the distribution of respondents across any vaccine acceptance category (PI includes 0).

Causal Risk Difference

However, as demonstrated in import_data.ipynb, it's not the case that people do not change their vaccine opinions at all. This can be due to recall-bias, or other individual-effects, which "cancel" out in a manner that the aggregate distribution in the society remains the same. This can be better understood by computing a "risk-difference" of exposure to misinformation over exposure to factual-information---see Equation 2 of the paper.

In [58]:
def model_impact_causal(df, kind='self', prior_mu=1., prior_sigma=1., prior_rho=1., iters=NUM_SAMPLES):
    # Model: Appendix C
    # Results: Tables S1, S2; Figures S1, S2
    import pystan as st
    model_code = '''
                data {
                    int<lower=0> n; //number of data points
                    int<lower=1> m; //number of conditions
                    int<lower=2> k; //number of outcomes
                    int<lower=1,upper=m> x_cond[n]; //treatment group
                    int<lower=1,upper=k> y_pre[n]; //pre-exposure outcome
                    int<lower=1,upper=k> y_post[n]; //post-exposure outcome
                }
                parameters {
                    real mu_alpha[k-1];
                    real mu_beta;
                    real<lower=0> sigma_alpha[k-1];
                    real<lower=0> sigma_beta;
                    real<lower=0> sigma_delta;
                    simplex[k-1] delta_delta;
                    simplex[k-1] delta[m];
                    real beta[m];
                    ordered[k-1] alpha[m];
                }
                model {
                    mu_alpha ~ normal(0, %f);
                    mu_beta ~ normal(0, %f);
                    sigma_alpha ~ exponential(%f);
                    sigma_beta ~ exponential(%f);
                    sigma_delta ~ exponential(%f);
                    {
                        vector[k-1] u;
                        for (i in 1:(k-1))
                            u[i] = 1;
                        delta_delta ~ dirichlet(%f*u);
                    }
                    for (i in 1:m){
                        beta[i] ~ normal(mu_beta, sigma_beta);
                        alpha[i] ~ normal(mu_alpha, sigma_alpha);
                        delta[i] ~ dirichlet(sigma_delta*delta_delta);
                    }
                    for (i in 1:n)
                        y_post[i] ~ ordered_logistic(beta[x_cond[i]]*sum(delta[x_cond[i]][:y_pre[i]-1]), alpha[x_cond[i]]);
                }
            '''%(prior_mu, prior_mu, prior_sigma, prior_sigma, prior_sigma, prior_rho)
    
    data = {'n':df.shape[0], 'm':2, 'k':4, 'x_cond':df['Treatment'].values+1, 
            'y_pre':df['Vaccine Intent for %s (Pre)'%kind].values, 
            'y_post':df['Vaccine Intent for %s (Post)'%kind].values}
    model = st.StanModel(model_code=model_code)
    fit = model.sampling(data=data, iter=iters)
    return fit
In [40]:
fit_impact_causal = model_impact_causal(df)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_ce16927ba6ef8fb803021d0fa421b63f NOW.
In [180]:
fit2stats_impact_causal = ut.stats_impact_causal(fit_impact_causal)
ut.unstack_df(fit2stats_impact_causal)
Out[180]:
Yes, definitely Unsure, lean yes Unsure, lean no No, definitely not
Control Yes, definitely 0.91 (0.89, 0.93) 0.09 (0.07, 0.11) 0.00 (0.00, 0.00) 0.00 (0.00, 0.00)
Unsure, lean yes 0.17 (0.13, 0.21) 0.74 (0.70, 0.78) 0.08 (0.06, 0.10) 0.01 (0.00, 0.01)
Unsure, lean no 0.01 (0.00, 0.02) 0.31 (0.23, 0.40) 0.53 (0.44, 0.62) 0.15 (0.09, 0.21)
No, definitely not 0.00 (0.00, 0.00) 0.03 (0.01, 0.06) 0.25 (0.15, 0.37) 0.71 (0.57, 0.83)
Treatment Yes, definitely 0.81 (0.79, 0.83) 0.18 (0.16, 0.20) 0.01 (0.01, 0.01) 0.00 (0.00, 0.00)
Unsure, lean yes 0.15 (0.13, 0.17) 0.62 (0.59, 0.64) 0.20 (0.17, 0.22) 0.04 (0.03, 0.05)
Unsure, lean no 0.01 (0.01, 0.02) 0.19 (0.16, 0.24) 0.47 (0.43, 0.51) 0.32 (0.27, 0.37)
No, definitely not 0.00 (0.00, 0.00) 0.02 (0.01, 0.02) 0.10 (0.07, 0.15) 0.88 (0.83, 0.92)
Treatment-Control Yes, definitely -0.10 (-0.14, -0.07) 0.09 (0.06, 0.12) 0.01 (0.01, 0.01) 0.00 (0.00, 0.00)
Unsure, lean yes -0.02 (-0.06, 0.02) -0.13 (-0.18, -0.08) 0.12 (0.09, 0.15) 0.03 (0.02, 0.04)
Unsure, lean no 0.00 (-0.00, 0.01) -0.12 (-0.22, -0.02) -0.06 (-0.16, 0.05) 0.17 (0.09, 0.25)
No, definitely not 0.00 (-0.00, 0.00) -0.02 (-0.05, 0.00) -0.15 (-0.27, -0.05) 0.17 (0.04, 0.31)

The above contributes to Tables S1, S2 and Figures S1, S2 of the paper. Let's generate those figures for a simple visual interpretation by using a helper function.

In [48]:
ut.plot_stats(fit2stats_impact_causal.loc['Treatment-Control']*100, oddsratio=False, title='$\\Delta_{RD}$ for UK', xlab='% Change between Treatment and Control', tick_suffix=' [POST]', label_suffix='\n[PRE]', factor=0.5)

It's evident that overall, respondents tend to transition from higher vaccine acceptance categoris to lower ones upon exposure to misinformation, relative to exposure to factual-information. An alternate representation of the entire above analysis is to draw Sankey plots which show the "flow" of people pre- and post-exposure to information.

In [54]:
ut.plot_causal_flow(fit2stats_impact_causal, fit2stats_impact_T, fit2stats_impact_C)

Determinants of Vaccine Hesitancy and Susceptibility to Misinformation

Once we have established that misinformation does impact vaccine acceptance, we can look at the socio-econo-demographic determinants of this hesitancy, and if certain groups of people might be more susceptible to COVID-19 vaccine misinformation.

To that end, we define a model which will determine the contribution of any variable to vaccine intent, while controling for the socio-demographic information.

In [64]:
def model_socdem(df, dd, atts=[], group=None, kind='self', prior_beta=1., prior_delta=1., prior_alpha=1., iters=NUM_SAMPLES):
    # Model: Ref 2, 3, 4, 5 in Table 3
    # Results: Tables S3, S4, S5; Figures 3, 4
    import pystan as st
    import numpy as np
    from src.bayesoc import Dim #we define some helper classes to extract posterior samples easily
    cats = ['Age', 'Gender', 'Education', 'Employment', 'Religion', 'Political', 'Ethnicity', 'Income']
    if isinstance(atts, str): atts = [atts]
    for att in atts: cats += [x for x in list(df) if x[:len(att)]==att]
    outs = ['Vaccine Intent for self (Pre)', 'Vaccine Intent for self (Post)', 'Treatment']
    df = df[cats+outs].dropna()
    dims = [Dim(pi=len(dd[cat]), beta_prior=prior_beta, value=dd[cat].keys(), name=cat) for cat in cats]
    stan = [d.get_stan() for d in dims]
    code = {'data':[], 'parameters':[], 'model':[], 'output':[]}
    for key in code:
        for d in stan: code[key].append(d[key])        
    mod_cd = '''
                data {
                    int<lower=1> n; //number of data points
                    int<lower=2> k; //number of outcomes
                    int<lower=1,upper=k> y_pre[n]; //pre-exposure
                    int<lower=1,upper=k> y_post[n]; //post-exposure
                    %s
                }
                parameters {
                    %s
                    simplex[k-1] delta;
                    ordered[k-1] alpha;
                }
                model {
                    %s
                    {
                        vector[k-1] u;
                        for (i in 1:(k-1))
                            u[i] = 1;
                        delta ~ dirichlet(%f*u);
                    }
                    alpha ~ normal(0, %f);
                    for (i in 1:n)
                        y_post[i] ~ ordered_logistic((%s)*sum(delta[:y_pre[i]-1]), alpha);
                }
            '''%('\n'.join(code['data']), '\n'.join(code['parameters']), '\n'.join(code['model']), prior_delta, prior_alpha, ' + '.join(code['output']))
    
    mod_bs = '''
                data {
                    int<lower=1> n; //number of data points
                    int<lower=2> k; //number of outcomes
                    int<lower=1,upper=k> y_pre[n]; //pre-exposure
                    %s
                }
                parameters {
                    %s
                    ordered[k-1] alpha;
                }
                model {
                    %s
                    alpha ~ normal(0, %f);
                    for (i in 1:n)
                        y_pre[i] ~ ordered_logistic(%s, alpha);
                }
            '''%('\n'.join(code['data']), '\n'.join(code['parameters']), '\n'.join(code['model']), prior_alpha, ' + '.join(code['output']))
    
    data = {}
    if group is not None:
        df = df.loc[df['Treatment']==group]
        data['y_post'] = df['Vaccine Intent for %s (Post)'%kind].values
    data['n'] = df.shape[0]
    data['k'] = 4
    data['y_pre'] = df['Vaccine Intent for %s (Pre)'%kind].values        
    print('Dataframe of size:', df.shape)
    for i in range(len(cats)):
        name = dims[i].name
        data['k_%s'%name] = len(dd[cats[i]])
        data[name] = np.array(df[cats[i]].values, dtype=int)
        if data[name].min()==0: data[name] += 1
    if group is None: model = st.StanModel(model_code=mod_bs)
    else: model = st.StanModel(model_code=mod_cd)
    fit = model.sampling(data=data, iter=iters)
    return fit

Socio-demographics and Social Media Usage

First, let us explore how social media usage might contribute to pre-exposure vaccine acceptance. This means a model with the socio-demographic as predictors, alongside social media usage variable, with the outcome being the pre-exposure vaccine intent. As before, we fit the model first.

In [65]:
fit_socdem_preexposure = model_socdem(df, dd, 'Social media')
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_17b00e7697800dd6b389f3d5b09b09bb NOW.
Dataframe of size: (4000, 12)

Next, we extract the posterior statistics for the odds-ratio (OR) of different parameters that we have considered---all socio-demographics, plus social media usage. This contributes to Table S3 of the paper.

In [179]:
stats_socdem_preexposure = ut.stats_socdem(fit_socdem_preexposure, dd, df, 'Social media', oddsratio=True)
stats_socdem_preexposure
Out[179]:
mean 2.5% 97.5% counts
Age 25-34 1.081917 0.853941 1.377030 802
35-44 1.001682 0.769990 1.251344 776
45-54 0.970100 0.758020 1.224177 690
55-64 0.758105 0.578985 0.982457 564
65+ 0.517113 0.353950 0.707797 664
Gender Female 1.433692 1.269342 1.619139 2283
Other 2.365496 1.032779 4.131663 25
Education Level-0 1.851821 1.293225 2.506882 178
Level-1 1.579464 1.271074 1.912968 1151
Level-2 1.497504 1.186259 1.860616 691
Level-3 1.215337 0.972045 1.483706 1045
Other 1.523268 1.131709 1.988931 307
Employment Unemployed 1.002085 0.754158 1.316074 221
Student 0.832167 0.599560 1.127415 198
Retired 0.709843 0.524647 0.929946 647
Other 0.908913 0.711044 1.131595 401
Religion Jewish 0.642433 0.306851 1.208983 44
Muslim 1.303824 0.899106 1.752041 151
Atheist 0.893090 0.765870 1.037151 1343
Other 1.043622 0.869038 1.263371 734
Political Labour 0.972672 0.834706 1.121197 1410
Liberal-Democrat 1.024920 0.806727 1.282859 307
SNP 0.872131 0.608899 1.220934 153
Other 1.863735 1.561429 2.250484 845
Ethnicity Black 2.074117 1.513868 2.854381 136
Asian 1.322907 1.029354 1.668032 298
Other 2.662293 1.664450 3.903022 73
Income Level-0 1.539532 1.191176 1.954636 584
Level-1 1.376619 1.102411 1.698131 790
Level-2 1.212772 0.952795 1.506839 748
Level-3 1.075571 0.872582 1.351431 956
Other 1.858569 1.348838 2.483945 247
Social media usage Less than 10 minutes per day 0.917646 0.697315 1.165071 535
10–30 minutes per day 0.907140 0.715791 1.157205 831
31–60 minutes per day 0.944701 0.714851 1.208210 623
1–2 hours per day 0.938581 0.714833 1.218117 635
2–3 hours per day 0.951097 0.696685 1.230850 383
More than 3 hours per day 0.893183 0.697183 1.153751 462

Next, we do a regression with the same predictors, except by adding the pre-exposure vaccine intent as an additional predictor with the outcome now being the post-exposure vaccine intent. This allows us to find determinants of "susceptibility" to misinformation. This contributes to Table S3 of the paper.

In [73]:
fit_socdem_susceptibility = model_socdem(df, dd, 'Social media', group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8f7d2593f7d09925e12a1038f65f6904 NOW.
Dataframe of size: (3000, 12)
In [94]:
stats_socdem_susceptibility = ut.stats_socdem(fit_socdem_susceptibility, dd, df, 'Social media', group=1, oddsratio=True)

Instead of printing out the whole table of results, we can use the plotting helper function to do a quick visualisation. This contributes to Figure 3 of the paper.

In [247]:
ut.plot_stats(stats_socdem_preexposure, stats_socdem_susceptibility, demos=True, title='Socio-demographic Determinants in UK', title_l='Pre-Exposure', title_r='Susceptibility', xlab='Odds Ratio', factor=0.5, titlesize=22, title_loc=-0.075)

We obtain some interesting results for the UK:

  1. Older age groups (55+) are significantly more willing to accept the vaccine than 18-24 year olds (OR<1).
  2. Females are significantly less willing to accept the vaccine than males (OR>1).
  3. Vaccine acceptance tends to increase with increase in education levels (OR significantly decreases w.r.t. baseline education Level-4). Something similar is observed with increase in income levels.
  4. Those who are retired are significantly more accepting of the vaccine when compared to those who are employed (OR<1).
  5. Those who do not support any mainstream political party in the UK, i.e. support "Others", are significantly less likely to accept the vaccine than compared to Conservatives (OR>1).
  6. Ethnic minorities are significantly less willing to accept a vaccine when compared to Whites (OR>1).
  7. Social media usage has no impact on vaccine acceptance (when controlling for all of the above socio-demographics).

Source of COVID-19 Information that are Trusted

It has been shown that trust plays a very important role when it comes to accessing public health information. We can thus do a similar analysis to observe how having trust in different sources of information for COVID-19 may determine vaccine hesitancy and susceptibility to misinformation.

In [85]:
fit_trust_preexposure = model_socdem(df, dd, 'Trust:')
fit_trust_susceptibility = model_socdem(df, dd, 'Trust:', group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_6744ba266f1a3ea14c140da089e06059 NOW.
Dataframe of size: (4000, 27)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_9e34bbd859a93c15981ec9c1c04d3f38 NOW.
Dataframe of size: (3000, 27)

Next, we compute the posterior statistics. This contributes to Table S5 of the paper. This time, we will compute the log of ORs, since +/- log OR values make for easier interpretation in terms of ranking determinants by the magnitude of log ORs, than judging by >1/<1 for OR values.

In [96]:
stats_trust_preexposure = ut.stats_socdem(fit_trust_preexposure, dd, df, 'Trust:', oddsratio=False)
stats_trust_susceptibility = ut.stats_socdem(fit_trust_susceptibility, dd, df, 'Trust:', group=1, oddsratio=False)

We can use the same helper function to plot the statistics for us. This contributes to Figure 4 of the paper.

In [225]:
ut.plot_stats(stats_trust_preexposure, stats_trust_susceptibility, title='Sources of COVID-19 Info Trusted in UK', title_l='Pre-Exposure', title_r='Susceptibility', oddsratio=False, ylabel=False, xlab='Log Odds Ratio', factor=0.5, titlesize=20)

Clearly, the largest determinant of pre-exposure vaccine acceptance is not trusting any of the mainstream sources of information. Although this pool of people isn't huge (see the sample counts on the right-hand axis of the plot), they are significantly less likely to accept the vaccine than those who trust some conventional source of information. This group of people was also more susceptible to misinformation.

Whereas those who trusted TV news, government briefings, health authorities, and perhaps surprisingly celebrities, were more willing to accept the vaccine. Those indicating trust in Family and Friends were significantly more unwilling to accept the vaccine, which could indicate that receicing information from informal sources does not fare positively for COVID-19 vaccine acceptance.

Reasons for Vaccine Hesitancy

Those who did not indicate that they will "definitely" accept the vaccine were asked for their reasons to be hesitant. This can provide some insight into psycho-social determinants of hesitancy and susceptibility.

In [92]:
fit_reason_preexposure = model_socdem(df, dd, 'Reason:')
fit_reason_susceptibility = model_socdem(df, dd, 'Reason:', group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_c89fa2eafc78344e80b290d30ce9a857 NOW.
Dataframe of size: (1833, 21)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_908e4fb570dec568da3419f5c54ac780 NOW.
Dataframe of size: (1375, 21)

Next, we compute the posterior statistics and visualise them, which contributes to Table S4 and Figure 4 of the paper.

In [98]:
stats_reason_preexposure = ut.stats_socdem(fit_reason_preexposure, dd, df, 'Reason:', oddsratio=False)
stats_reason_susceptibility = ut.stats_socdem(fit_reason_susceptibility, dd, df, 'Reason:', group=1, oddsratio=False)
In [230]:
ut.plot_stats(stats_reason_preexposure, stats_reason_susceptibility, title='Reasons for COVID-19 Vaccine Hesitancy in UK', title_l='Pre-Exposure', title_r='Susceptibility', oddsratio=False, ylabel=False, xlab='Log Odds Ratio', factor=0.5, titlesize=20, title_loc=0.05)

Evidently, only the group of people indicating they will "wait until others" received the vaccine, were significantly on the lower-end of COVID-19 vaccine hesitancy. Those indicating they were "not at risk" or "won't be ill" probably believe the risk of accepting the vaccine outweigh its need, and thus are also less willing to accept it. Those indicating worries about vaccine safety are also more unsure about the vaccine. But the largest contribution is those indicating "other" reasons, which do not fall into these "expected" reasons of hesitancy.

In terms of susceptibility, it seems like the only group that is significantly more susceptible is the one which indicated that the vaccine "approval may be rushed". This hints at the kind of psychological effect exposure to the misinformation images may have had on the respondents, perhaps provoking that fear of rushed approval.

What makes misinformation impactful?

After exposure to the 5 pieces of misinformation (treatment) or factual information (control), the respondents were asked 5 follow-up questions to judge their perception of each of the images they were shown. These questions asked them to rate on a 5-level scale the extent that: 1) they agree with the information displayed; 2) they are inclined to be vaccinated; 3) they believe the information to be trustworthy; 4) they will fact-check the information; and 5) they would share the image.

  1. The image makes them more inclined to get vaccinationed: "Vaccine Intent"
  2. They agree with the information displayed: "Agree with"
  3. They believe the information to be trustworthy: "Have trust in"
  4. They will fact-check the information: "Will fact-check"
  5. They will share the image: "Will share"

Let us infer the respondent's perceptions of each image they were shown.

In [103]:
def model_image_perceptions(df, group=1, prior_alpha=1., iters=NUM_SAMPLES):
    # Model: Model for self-reported image-metrics
    # Results: Figure 5
    import pystan as st
    import numpy as np
    model_code = '''
                    data {
                        int<lower=0> n; //number of data points
                        int<lower=2> k; //number of outcomes
                        int<lower=1,upper=k> y[n]; //outcome per sample
                    }
                    parameters {
                        ordered [k-1] alpha;
                    }
                    model {
                        alpha ~ normal(0, %f);
                        for (i in 1:n)
                            y[i] ~ ordered_logistic(0, alpha);
                    }
                '''%(prior_alpha)
    metrics = ['Vaccine Intent', 'Agreement', 'Trust', 'Fact-check', 'Share']
    fits = [dict() for i in range(5)]
    df = df.loc[df['Treatment']==group]
    for i in range(5):
        for m in metrics:
            data = {'n':df.shape[0], 'k':5, 'y':df['Image %i:%s'%(i+1, m)].values+3}
            model = st.StanModel(model_code=model_code)
            fits[i][m] = model.sampling(data=data, iter=iters)
    return fits
In [104]:
fit_image_perceptions_T = model_image_perceptions(df, group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
In [105]:
fit_image_perceptions_C = model_image_perceptions(df, group=0)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8d7e2844f21474522b192a473781c344 NOW.
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:100 of 400 iterations saturated the maximum tree depth of 10 (25 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation
WARNING:pystan:Chain 4: E-BFMI = 0.0533
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model

As before, we now extract the posterior statistics. We can visualise the mean estimate of these 5-level Likert categories for every image and image-metric using a helper function. This contributes to Figure 5 of the paper.

In [124]:
stats_image_perceptions_T = ut.stats_image_perceptions(fit_image_perceptions_T)
stats_image_perceptions_C = ut.stats_image_perceptions(fit_image_perceptions_C)
In [152]:
ut.plot_image_perceptions([stats_image_perceptions_T, stats_image_perceptions_C], ['Treatment', 'Control'])

Evidently, upon seeing the misinformation images (treatment), people's "self-reported" vaccine intent lowers more than it rises (red bars longer than blue ones), and we note the opposite effect for factual images (control). This is interesting, because as noted in the first set of models, while there is indeed the measured drop in vaccine intent upon seeing misinformation, there was no aggregate measured change upon seeing the factual information, even though people "self-report" otherwise. For all other image-metrics, it is evident that people are more likely to agree with have trust in, and share the factual information; and less likely to fact-check it, when compared to misinformation.

Which images are more impactful than others?

One interesting follow-up analysis is to use the self-reported image metrics as "features" of the images, to compute the contribution every image has in lowering the measured vaccine intent.

In [154]:
def model_image_impact(df, group=1, kind='self', prior_beta=1., prior_delta=1., prior_gamma=1., prior_alpha=1., iters=NUM_SAMPLES):
    # Model: Ref 6, Table 3
    # Results: Tables S6, S7
    import pystan as st
    import numpy as np
    model_code = '''
                    data {
                        int<lower=1> n; //number of data points
                        int<lower=1> p; //number of images
                        int<lower=1> m; //number of metrics
                        int<lower=2> k; //number of outcomes
                        int<lower=1,upper=k> y_pre[n]; //pre-exposure
                        int<lower=1,upper=k> y_post[n]; //post-exposure
                        matrix[p,m] x_img[n]; //image metrics
                    }
                    parameters {
                        vector[m] beta;
                        simplex[p] gamma;
                        simplex[k-1] delta;
                        ordered[k-1] alpha;
                    }
                    model {
                        beta ~ normal(0, %f);
                        {
                            vector[p] u_img;
                            for (i in 1:p)
                                u_img[i] = 1;
                            gamma ~ dirichlet(%f*u_img);
                        }
                        {
                            vector[k-1] u;
                            for (i in 1:(k-1))
                                u[i] = 1;
                            delta ~ dirichlet(%f*u);
                        }
                        alpha ~ normal(0, %f);
                        for (i in 1:n)
                            {
                                real b = to_row_vector(gamma)*x_img[i]*beta;
                                y_post[i] ~ ordered_logistic(b*sum(delta[:y_pre[i]-1]), alpha);
                            }
                    }
                '''%(prior_beta, prior_gamma, prior_delta, prior_alpha)
    
    metrics = ['Vaccine Intent', 'Agreement', 'Trust', 'Fact-check', 'Share']
    df = df.loc[df['Treatment']==group]
    x = np.dstack([df[['Image %i:%s'%(i+1, m) for i in range(5)]].values for m in metrics])
    data = {'n':df.shape[0], 'p':5, 'm':len(metrics), 'k':4, 'x_img':x,
            'y_pre':df['Vaccine Intent for %s (Pre)'%kind].values, 
            'y_post':df['Vaccine Intent for %s (Post)'%kind].values}
    model = st.StanModel(model_code=model_code)
    fit = model.sampling(data=data, iter=iters)
    return fit
In [155]:
fit_image_impact_T = model_image_impact(df, group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_54a8039bd7190e970fa44907f5e701fa NOW.
In [156]:
fit_image_impact_C = model_image_impact(df, group=0)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_54a8039bd7190e970fa44907f5e701fa NOW.

As before, we now estimate posterior statistics on $\beta$, which refers to the contribution of every image-metric, and $\gamma$, which measures the weights of every image. This contributes to Tables S6, S7 in the paper.

In [169]:
stats_image_impact_T = ut.stats_image_impact(fit_image_impact_T, oddsratio=True)
stats_image_impact_C = ut.stats_image_impact(fit_image_impact_C, oddsratio=True)
In [174]:
ut.combine_dfs(stats_image_impact_T, stats_image_impact_C, '(Treatment)', '(Control)')
Out[174]:
Value (Treatment) Value (Control)
beta[1] 43.81 (26.62, 68.81) 2.13 (0.97, 4.48)
beta[2] 2.52 (1.28, 4.58) 1.54 (0.53, 3.56)
beta[3] 0.37 (0.21, 0.65) 3.14 (1.17, 6.92)
beta[4] 1.71 (1.34, 2.20) 2.25 (1.41, 3.49)
beta[5] 0.16 (0.11, 0.23) 0.04 (0.02, 0.07)
gamma[1] 0.37 (0.29, 0.45) 0.19 (0.00, 0.39)
gamma[2] 0.26 (0.17, 0.35) 0.27 (0.10, 0.43)
gamma[3] 0.09 (0.00, 0.18) 0.09 (0.00, 0.24)
gamma[4] 0.07 (0.01, 0.16) 0.17 (0.03, 0.34)
gamma[5] 0.21 (0.13, 0.30) 0.27 (0.08, 0.43)

Observing $\gamma$, it appears that the first/fourth image in the misinformation set has the highest/lowest impact on reducing acceptance of a COVID-19 vaccine. Whereas the third image in the factual-information set has the lowest impact on reducing vaccine acceptance. A qualitative assessment of these images makes it clear that they have a scientific temperament, in terms of the semantic content. This could be indicative of scientific messaging having a bigger role to play than "memetic" information in swaying COVID-19 vaccine opinion.

Does misinformation spiral into filter bubbles?

This study shows that their is a measurable impact of COVID-19 vaccine misinformation on accepting a potential COVID-19 vaccine. However being a survey study, it has many limitations. In particular, this study doesn't capture the complex ways in which information is exposed to people on social media platforms---governed by a complex combination of what the platform's algorithms show them, what theior freidns or followers share with them, and what they choose to consume. Therefore, we can expect the effects to amplify (or dilute down) in a natural setting. However, given that information selection in an online setting tends to cause people to view more of the content that aligns with their ideologies, it often leads to a positive feedback loop wherein "filter bubbles" can form. We wanted to investigate if there is some evidence of such effects with regards to COVID-19 vaccine misinformation and acceptance.

One may hypothesise that those who are less accepting of the vaccine may have been exposed to more misinformation, and less of the factual information about them. After the exposure, we asked respondents if they had seen "similar" content circulating on the social media platforms that they used recently, in the last one month---to which they could respond with "yes" or "no". Based on that response, we can test the above hypothesis.

In [183]:
def model_filterbubble(df, group=1, kind='self', prior_beta=1., prior_alpha=1., iters=NUM_SAMPLES):
    # Model: Model for evidence of filter-bubble effects of (mis)information exposure with regards to vaccine intent
    # Results: Table S8; Figure S3
    import pystan as st
    import numpy as np    
    model_code = '''
                    data {
                        int<lower=1> n; //number of data points
                        int<lower=1> m; //number of categories
                        int<lower=2> k; //number of outcomes
                        int<lower=1,upper=m> y_pre[n]; //pre-exposure
                        int<lower=1,upper=k> y_see[n]; //seen images
                    }
                    parameters {
                        real beta[m];
                        ordered[k-1] alpha;
                    }
                    model {
                        beta ~ normal(0, %f);
                        alpha ~ normal(0, %f);
                        for (i in 1:n)
                            y_see[i] ~ ordered_logistic(beta[y_pre[i]], alpha);
                    }
                '''%(prior_beta, prior_alpha)
    
    df = df.loc[(df['Treatment']==group) & (df['Seen such online content']!=3)] #ignoring do-not-know's
    data = {'n':df.shape[0], 'm':4, 'k':2,
            'y_pre':df['Vaccine Intent for %s (Pre)'%kind].values,
            'y_see':[i%2+1 for i in df['Seen such online content'].values]} #"yes":2, "no":1 for ordinal logit
    model = st.StanModel(model_code=model_code)
    fit = model.sampling(data=data, iter=iters)
    return fit
In [184]:
fit_filterbubble_T = model_filterbubble(df, group=1)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_ab5e2299cdcd722f084ff3dd107ec6eb NOW.
In [185]:
fit_filterbubble_C = model_filterbubble(df, group=0)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_ab5e2299cdcd722f084ff3dd107ec6eb NOW.

We can estimate the posterior distribution of those saying "yes", they had seen similar images online recently. This contributes to Table S8 of the paper.

In [212]:
stats_filterbubble_T = ut.stats_filterbubble(fit_filterbubble_T)
stats_filterbubble_C = ut.stats_filterbubble(fit_filterbubble_C)
ut.combine_dfs(stats_filterbubble_T, stats_filterbubble_C, '(Treatment)', '(Control)', perc=True)
Out[212]:
Value (Treatment) Value (Control)
Yes, definitely 30.8 (28.5, 33.4) 38.6 (34.5, 42.4)
Unsure, lean yes 33.4 (30.6, 36.3) 29.6 (24.3, 35.6)
Unsure, lean no 31.8 (26.0, 37.7) 26.0 (16.9, 36.1)
No, definitely not 41.0 (33.7, 47.8) 22.8 (11.3, 36.7)

It appears that indeed, as people become less accepting of the vaccine, they tend to respond more positively to having seen misinformation, and less to factual information. To see this better, we can compute the posterior estimates by contrasting with the baseline group of "Yes, definitely", and visualize with the helper function. This contributes to Table S8 and Figure S3 of the paper.

In [213]:
stats_filterbubble_T_delta = ut.stats_filterbubble(fit_filterbubble_T, contrast=True)
stats_filterbubble_C_delta = ut.stats_filterbubble(fit_filterbubble_C, contrast=True)
ut.combine_dfs(stats_filterbubble_T_delta, stats_filterbubble_C_delta, '(Treatment)', '(Control)', perc=True)
Out[213]:
Value (Treatment) Value (Control)
Unsure, lean yes 2.6 (-1.1, 6.2) -8.9 (-15.1, -1.6)
Unsure, lean no 1.0 (-4.9, 7.5) -12.6 (-22.2, -1.4)
No, definitely not 10.2 (2.2, 17.1) -15.8 (-28.2, -0.9)
In [239]:
ut.plot_stats(100*stats_filterbubble_T_delta, 100*stats_filterbubble_C_delta, oddsratio=False, title='COVID-19 Vax Misinfo "Filter Bubbles" in the UK', title_l='Treatment', title_r='Control', xlab='% Change w.r.t. "Yes, definitely"', factor=0.5, titlesize=20, title_loc=0.4)

Therefore indeed, those indicating they would "definitely not" accept the vaccine were 10.2% more likely to have encountered similar misinformation, and 15.8% less likely to have encountered similar factual information online, when compared to those who would "definitely" accept the vaccine.